This tutorial demonstrates the R package cpam for the
analysis of time series omics data. It reproduces the results for the
second case study presented in the manuscript Shape-constrained,
changepoint additive models for time series omics data with
cpam.
The data are oublically available
You can install the package from GitHub using:
First we create the experimental design which must have at least the following columns: time, sample, and path. The rep column is optional and is used here to generate the sample names. We have already run kallisto to quantify transcript abundances (with 100 bootstraps), and here the path column contains the path to the abundance file for each sample.
ed <-
expand_grid(time = c(0,30,60,67.5,75,90,120), rep = 1:3) %>%
mutate(sample = paste0("pc_t",time,"_r",rep),
path = paste0("case_studies/crisp/data/kallisto/",sample,"/abundance.h5"))
edThe transcript-to-gene mapping for Arabidopsis thaliana can be downloaded from TAIR (…). We load this now
To fit the models, we first prepare the cpam object, then compute p-values, estimate the changepoint, and select the shape of each transcript. The last step takes the longest (here just under 13 minutes) but it is worth the wait to be able to visualise and cluster the transcripts by shape.
cpo <- prepare_cpam(exp_design = ed,
model_type = "case-only",
t2g = t2g,
import_type = "kallisto",
num_cores = 4)
cpo <- compute_p_values(cpo) # 1:52
cpo <- estimate_changepoint(cpo) # 6:32 secs
cpo <- select_shape(cpo) # 12:54 secs
We can look at a summary of the fitted cpam object
cpo
## cpam object
## -----------
## case-only time series
## 21 samples
## 7 time points
## Overdispersion estimated using 100 inferential replicates
## Counts rescaled by estimated overdispersionAlthough it cannot be launched here on the tutorial page, you can
access the Shiny app to visualise the results by running the following
code visualise(cpo).
The results of the analysis are summarised using the
results function.
The generated results list can be filtered by specifying minimum counts, minimum log-fold changes, and maximum p-values. For example, to return only the transcripts with a log-fold change greater than 2, at least 50 counts, and a p-value less than 0.01, we can run
A single gene can be plotted using the plot_cpam
function. Here we plot the gene AT1G64140
The
substitle shows (0,tp) indicating a changepoint at time point 0 (or no
changepoint) and an unconstained ‘tp’ (thinplate) shape. This selection
of ‘tp’ suggest that the trend for this gene does not conform to one of
the simpler shape types that cpam uses. To force the cpam to choose
among the simpler forms, we set
shape_type = "shape2" in
the plot_cpam function.
Here
a concave shape (‘cv’) is chosen, and we can see this deviates from the
data more that the ‘tp’ shape.
Next we plot a gene with multiple transcripts, some of which have changepoints different from 0.
The first transcript has a changepoint at 30 mins and the second at 0.
Both have an unconstrained shape. Changepoint, shape and other results
from the fitted models can also be extracted manually from the cpam
object. For example, to extract the shape of the transcripts
Lastly, we plot the two remaining genes shown in the manuscript, AT4G34590 and AT3G23280.
The results function can be used to generate clusters according to
selected filters. Here we generate clusters of transcripts with at least
100 counts, a log-fold change greater than 1, and a p-value less than
0.01. The plot_cluster function can then be used to
visualise the clusters which we do here for targets with changepoints at
30 mins and the ‘micv’ (montonic increasing concave) shape.
res <- results(cpo, min_count = 100, min_lfc = 1, p_threshold = 0.01)
plot_cluster(cpo, res, changepoints = 30, shapes = "micv")
Clustering can be further refined based on, for example, the rate at
which the above transcripts attain their maximum values. We illustrate
refinements such as this in another case study here.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Pop!_OS 22.04 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Hobart
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] stringr_1.5.1 tidyr_1.3.1 readr_2.1.5 dplyr_1.1.4 cpam_0.0.0.9000
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 generics_0.1.3 bspm_0.5.7 stringi_1.8.4 lattice_0.22-6 hms_1.1.3
## [7] digest_0.6.37 magrittr_2.0.3 evaluate_0.24.0 grid_4.4.2 RColorBrewer_1.1-3 fastmap_1.2.0
## [13] jsonlite_1.8.9 Matrix_1.7-1 mgcv_1.9-1 purrr_1.0.2 scales_1.3.0 jquerylib_0.1.4
## [19] cli_3.6.3 rlang_1.1.5 aggregation_1.0.1 munsell_0.5.1 splines_4.4.2 scam_1.2-18
## [25] withr_3.0.2 cachem_1.1.0 yaml_2.3.10 tools_4.4.2 tzdb_0.4.0 colorspace_2.1-1
## [31] ggplot2_3.5.1 vctrs_0.6.5 R6_2.5.1 matrixStats_1.5.0 lifecycle_1.0.4 pkgconfig_2.0.3
## [37] pillar_1.10.1 bslib_0.8.0 gtable_0.3.6 glue_1.8.0 xfun_0.46 tibble_3.2.1
## [43] tidyselect_1.2.1 highr_0.11 rstudioapi_0.17.0 knitr_1.48 farver_2.1.2 htmltools_0.5.8.1
## [49] nlme_3.1-166 rmarkdown_2.27 labeling_0.4.3 compiler_4.4.2